01:55
2026-05-30
github.com
large-language-models
Tuning CPU-only Qwen3-30B inference with an IBM Quantum sampling loop
A developer achieved 14.03 generation tokens per second running the Qwen3-30B-A3B-Instruct Mixture-of-Experts LLM on a 2017 Intel MacBook Air with only 8GB RAM and no GPU, using an IBM Quantum samplinβ¦